Analysis of Fatalities, injuries and damage caused by severe weather events

Synopsis

Based on recorded storm data from 1950 through 2011, this document tries to provide some insight into the effects of severe weather on the both the economic and population of United States of America. These kinds of answers can be beneficial to plan responses to severe weather events and to prepare contingency plans.

We found that convection events (Lightning, Tornadoes, Thunderstorm Wind, Hail) are the most harmful to public health. We also found that Flood events (Flash Floods, River Floods) are the most damaging to property and crops. Further, we looked at states having most damages to property and crops and states which had more population damages.

Data Processing

Load the libraries we will need:

library(ggplot2)
library(maps)
library(mapproj)
library(rCharts)

Our data is derived from the NOAA Storm Database.
Read in the data, mapping as many fields to numerical fields as possible. We are not converting the dates at this point, as we do not need the dates in our analysis. More information about the data file is available from the National Weather Service Storm Data Documentation. The rest of the data is read in from huge csv containing more than 9 million observations.

The initial set has 902,297 observations. We first throw away all data that does not contain information we are interested in by filtering out data that did not cause fatalities, injuries or damage. Let’s take a look at summary of the initial data set

summary(data)
##     STATE__       BGN_DATE           BGN_TIME          TIME_ZONE        
##  Min.   : 1.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.:19.0   Class :character   Class :character   Class :character  
##  Median :30.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :31.2                                                           
##  3rd Qu.:45.0                                                           
##  Max.   :95.0                                                           
##                                                                         
##      COUNTY       COUNTYNAME           STATE              EVTYPE         
##  Min.   :  0.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.: 31.0   Class :character   Class :character   Class :character  
##  Median : 75.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :100.6                                                           
##  3rd Qu.:131.0                                                           
##  Max.   :873.0                                                           
##                                                                          
##    BGN_RANGE          BGN_AZI           BGN_LOCATI       
##  Min.   :   0.000   Length:902297      Length:902297     
##  1st Qu.:   0.000   Class :character   Class :character  
##  Median :   0.000   Mode  :character   Mode  :character  
##  Mean   :   1.484                                        
##  3rd Qu.:   1.000                                        
##  Max.   :3749.000                                        
##                                                          
##    END_DATE           END_TIME           COUNTY_END  COUNTYENDN       
##  Length:902297      Length:902297      Min.   :0    Length:902297     
##  Class :character   Class :character   1st Qu.:0    Class :character  
##  Mode  :character   Mode  :character   Median :0    Mode  :character  
##                                        Mean   :0                      
##                                        3rd Qu.:0                      
##                                        Max.   :0                      
##                                                                       
##   END_RANGE           END_AZI           END_LOCATI       
##  Length:902297      Length:902297      Length:902297     
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##      LENGTH              WIDTH               F            
##  Min.   :   0.0000   Min.   :   0.000   Length:902297     
##  1st Qu.:   0.0000   1st Qu.:   0.000   Class :character  
##  Median :   0.0000   Median :   0.000   Mode  :character  
##  Mean   :   0.2301   Mean   :   7.503                     
##  3rd Qu.:   0.0000   3rd Qu.:   0.000                     
##  Max.   :2315.0000   Max.   :4400.000                     
##                                                           
##       MAG            FATALITIES          INJURIES        
##  Min.   :    0.0   Min.   :  0.0000   Min.   :   0.0000  
##  1st Qu.:    0.0   1st Qu.:  0.0000   1st Qu.:   0.0000  
##  Median :   50.0   Median :  0.0000   Median :   0.0000  
##  Mean   :   46.9   Mean   :  0.0168   Mean   :   0.1557  
##  3rd Qu.:   75.0   3rd Qu.:  0.0000   3rd Qu.:   0.0000  
##  Max.   :22000.0   Max.   :583.0000   Max.   :1700.0000  
##                                                          
##     PROPDMG         PROPDMGEXP           CROPDMG         CROPDMGEXP       
##  Min.   :   0.00   Length:902297      Min.   :  0.000   Length:902297     
##  1st Qu.:   0.00   Class :character   1st Qu.:  0.000   Class :character  
##  Median :   0.00   Mode  :character   Median :  0.000   Mode  :character  
##  Mean   :  12.06                      Mean   :  1.527                     
##  3rd Qu.:   0.50                      3rd Qu.:  0.000                     
##  Max.   :5000.00                      Max.   :990.000                     
##                                                                           
##      WFO             STATEOFFIC         ZONENAMES            LATITUDE   
##  Length:902297      Length:902297      Length:902297      Min.   :   0  
##  Class :character   Class :character   Class :character   1st Qu.:2802  
##  Mode  :character   Mode  :character   Mode  :character   Median :3540  
##                                                           Mean   :2875  
##                                                           3rd Qu.:4019  
##                                                           Max.   :9706  
##                                                           NA's   :47    
##    LONGITUDE        LATITUDE_E     LONGITUDE_       REMARKS         
##  Min.   :-14451   Min.   :   0   Min.   :-14455   Length:902297     
##  1st Qu.:  7247   1st Qu.:   0   1st Qu.:     0   Class :character  
##  Median :  8707   Median :   0   Median :     0   Mode  :character  
##  Mean   :  6940   Mean   :1452   Mean   :  3509                     
##  3rd Qu.:  9605   3rd Qu.:3549   3rd Qu.:  8735                     
##  Max.   : 17124   Max.   :9706   Max.   :106220                     
##                   NA's   :40                                        
##      REFNUM      
##  Min.   :     1  
##  1st Qu.:225575  
##  Median :451149  
##  Mean   :451149  
##  3rd Qu.:676723  
##  Max.   :902297  
## 

We then filter out the unwanted events and keep only those events that have some kind of fatalities.

smallData <- data[data$FATALITIES > 0 | data$INJURIES > 0 | data$PROPDMG > 0 | 
    data$CROPDMG > 0, ]

This leaves us with 254,633 observations.

The EVTYPE fields contains a large number of errors and issues. In order to report on the data, we will add an additional column named category that contains the event category as used by the NCDC: - convection
- extreme temperature - flood - marine - tropical cyclon - winter - other

This is also the order of importance with which we will treat the various events. Convection events are the most important, so this order will also decide the tie-breaker if an event belongs to more than one category.

The PROPDMG and CROPDMG fields need some conversion before we can do math on them. We add two extra columns that contain the property and crop damage.
we replace each class using a standard to define the damage expenses.

Having gone through the steps above, we now have clean dataset containing all the information to perform exploratory data analysis. Let us know begin exploring what the dataset has to say

Exploratory Data Analysis

We begin by exploring all the variables in the data set.

Univariate Plots

## Scale for 'x' is already present. Adding another scale for 'x', which will replace the existing scale.

Figure above shows the frequency graph of number of events occuring in the respective counties. The mean value of frequence of occurrence is around 22, thus showing that on an average you can expect a place to have an event occuring 22 time over the years between 1950-2011.

mean(out$count)
## [1] 22.20378

After adding the category variable let’s take a look at a few stats of the subsetted dataset

summary(smallData)
##     STATE__        BGN_DATE           BGN_TIME          TIME_ZONE        
##  Min.   : 1.00   Length:254633      Length:254633      Length:254633     
##  1st Qu.:19.00   Class :character   Class :character   Class :character  
##  Median :29.00   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :30.12                                                           
##  3rd Qu.:45.00                                                           
##  Max.   :95.00                                                           
##                                                                          
##      COUNTY        COUNTYNAME           STATE              EVTYPE         
##  Min.   :  0.00   Length:254633      Length:254633      Length:254633     
##  1st Qu.: 31.00   Class :character   Class :character   Class :character  
##  Median : 77.00   Mode  :character   Mode  :character   Mode  :character  
##  Mean   : 96.26                                                           
##  3rd Qu.:129.00                                                           
##  Max.   :869.00                                                           
##                                                                           
##    BGN_RANGE         BGN_AZI           BGN_LOCATI       
##  Min.   :  0.000   Length:254633      Length:254633     
##  1st Qu.:  0.000   Class :character   Class :character  
##  Median :  0.000   Mode  :character   Mode  :character  
##  Mean   :  1.207                                        
##  3rd Qu.:  1.000                                        
##  Max.   :177.000                                        
##                                                         
##    END_DATE           END_TIME           COUNTY_END  COUNTYENDN       
##  Length:254633      Length:254633      Min.   :0    Length:254633     
##  Class :character   Class :character   1st Qu.:0    Class :character  
##  Mode  :character   Mode  :character   Median :0    Mode  :character  
##                                        Mean   :0                      
##                                        3rd Qu.:0                      
##                                        Max.   :0                      
##                                                                       
##   END_RANGE           END_AZI           END_LOCATI       
##  Length:254633      Length:254633      Length:254633     
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##      LENGTH              WIDTH              F            
##  Min.   :   0.0000   Min.   :   0.00   Length:254633     
##  1st Qu.:   0.0000   1st Qu.:   0.00   Class :character  
##  Median :   0.0000   Median :   0.00   Mode  :character  
##  Mean   :   0.6651   Mean   :  21.56                     
##  3rd Qu.:   0.0000   3rd Qu.:   0.00                     
##  Max.   :1845.0000   Max.   :4400.00                     
##                                                          
##       MAG            FATALITIES          INJURIES        
##  Min.   :   0.00   Min.   :  0.0000   Min.   :   0.0000  
##  1st Qu.:   0.00   1st Qu.:  0.0000   1st Qu.:   0.0000  
##  Median :   0.00   Median :  0.0000   Median :   0.0000  
##  Mean   :  31.43   Mean   :  0.0595   Mean   :   0.5519  
##  3rd Qu.:  52.00   3rd Qu.:  0.0000   3rd Qu.:   0.0000  
##  Max.   :3430.00   Max.   :583.0000   Max.   :1700.0000  
##                                                          
##     PROPDMG         PROPDMGEXP           CROPDMG         CROPDMGEXP       
##  Min.   :   0.00   Length:254633      Min.   :  0.000   Length:254633     
##  1st Qu.:   2.00   Class :character   1st Qu.:  0.000   Class :character  
##  Median :   5.00   Mode  :character   Median :  0.000   Mode  :character  
##  Mean   :  42.75                      Mean   :  5.411                     
##  3rd Qu.:  25.00                      3rd Qu.:  0.000                     
##  Max.   :5000.00                      Max.   :990.000                     
##                                                                           
##      WFO             STATEOFFIC         ZONENAMES            LATITUDE   
##  Length:254633      Length:254633      Length:254633      Min.   :   0  
##  Class :character   Class :character   Class :character   1st Qu.:   0  
##  Mode  :character   Mode  :character   Mode  :character   Median :3440  
##                                                           Mean   :2738  
##                                                           3rd Qu.:4002  
##                                                           Max.   :7025  
##                                                           NA's   :4     
##    LONGITUDE        LATITUDE_E     LONGITUDE_       REMARKS         
##  Min.   :-14451   Min.   :   0   Min.   :-14455   Length:254633     
##  1st Qu.:     0   1st Qu.:   0   1st Qu.:     0   Class :character  
##  Median :  8422   Median :   0   Median :     0   Mode  :character  
##  Mean   :  6545   Mean   :1758   Mean   :  4218                     
##  3rd Qu.:  9231   3rd Qu.:3641   3rd Qu.:  8835                     
##  Max.   : 17124   Max.   :7025   Max.   : 17124                     
##                   NA's   :4                                         
##      REFNUM         category         propertydamageEXP  
##  Min.   :     1   Length:254633      Min.   :1.000e+00  
##  1st Qu.:281406   Class :character   1st Qu.:1.000e+03  
##  Median :473485   Mode  :character   Median :1.000e+03  
##  Mean   :484335                      Mean   :2.025e+05  
##  3rd Qu.:703590                      3rd Qu.:1.000e+03  
##  Max.   :902260                      Max.   :1.000e+09  
##                                                         
##  propertydamage      cropdamageEXP         cropdamage       
##  Min.   :0.000e+00   Min.   :1.000e+00   Min.   :0.000e+00  
##  1st Qu.:2.000e+03   1st Qu.:1.000e+00   1st Qu.:0.000e+00  
##  Median :1.000e+04   Median :1.000e+00   Median :0.000e+00  
##  Mean   :1.678e+06   Mean   :3.568e+04   Mean   :1.928e+05  
##  3rd Qu.:3.500e+04   3rd Qu.:1.000e+03   3rd Qu.:0.000e+00  
##  Max.   :1.150e+11   Max.   :1.000e+09   Max.   :5.000e+09  
## 

Let us explore a few univariate plots to find some trend in the data set.

ggplot(smallData, aes(x=category)) + geom_histogram(binwidth=50)+
  xlab("Categories of Events Responsible for most damage")

The figure above shows a univariate plot showing the number of occurences of different events that have occured between the years 1950-2011. Convection events have occured the most over these years with second place - floods. Convection events comprise of the following - LIGHTING - TORNADO - WND - HAIL

It is also worth exploring the distribution of fatalities and injuries over the years

ggplot(smallData, aes(x=FATALITIES)) + geom_histogram()+
  xlab("Number of Fatalities")+xlim(0,10)

The fatalities plot show that most events did not incur any fatalities. This is due to the fact that we are looking at individual events at individual places over the years. Yearly Fatalities and Injuries would sum up to a bigger number. Therefore, plotting injuries too would not make any sense as both of these plots would show the same variation. It is better to capture property damage.

ggplot(smallData, aes(x=PROPDMG)) + geom_histogram()+
  xlab("Property Damage in Millions")+xlim(0,30)

This plot would have surely captured the data as property damages occur even if threat to population is less even in the mildest of events. we find that on an average around 42 million USD worth of damages are caused due to natural calamities.

summary(smallData$PROPDMG)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    2.00    5.00   42.75   25.00 5000.00

Another important aspect of economic damages that are caused by natural disasters are crop damages. Lets take a look at what the graphs have to say:

summary(smallData$CROPDMG)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   0.000   5.411   0.000 990.000
ggplot(smallData, aes(x=CROPDMG)) + geom_histogram()+
  xlab("Crops Damage in Millions")+xlim(0,30)

Damages to agriculture and crops is less as compared to property damages.

ll<-data.frame(table(smallData$EVTYPE))
ll<-ll[order(-ll[,2]),]
ll[1:20,]
##                     Var1  Freq
## 418            TSTM WIND 63234
## 362    THUNDERSTORM WIND 43655
## 404              TORNADO 39944
## 132                 HAIL 26130
## 72           FLASH FLOOD 20967
## 251            LIGHTNING 13293
## 379   THUNDERSTORM WINDS 12086
## 85                 FLOOD 10175
## 192            HIGH WIND  5522
## 348          STRONG WIND  3370
## 476         WINTER STORM  1508
## 166           HEAVY SNOW  1342
## 156           HEAVY RAIN  1105
## 468             WILDFIRE   857
## 235            ICE STORM   708
## 453 URBAN/SML STREAM FLD   702
## 58        EXCESSIVE HEAT   698
## 200           HIGH WINDS   657
## 432       TSTM WIND/HAIL   441
## 413       TROPICAL STORM   416

The table above shows the top 20 event types and TSTM and thunderstorm are the top 2 most recurrent events in USA. After exploring the most influential variables, let’s have a look at some bivariate plots to understand what the data has to say more to us.

Bivariate Plots

Let’s have a closer look at the regions with lot of events occuring in the time span between 1950-2011. I am subsetting the data so as to plot only those county’s that have witnessed more than 1500 events

The County of Washington has the highest number of weather events that has occurred over these years. Let’s now have a close look as to what events have taken place most in washington as per the category we have assigned. In order to further investigate what type of events were occurring in washington, I decided to subset the data to have alook at the county of Washington more carefully.

washington2<-subset(smallData, 
                    COUNTYNAME == "WASHINGTON" & FATALITIES > 0)
washington1<-table(washington2$category)

washington<-data.frame(events=names(unlist(washington1)),
                       count = unlist(washington1)[], stringsAsFactors = FALSE)

ggplot(washington, aes(x=events, y = count, fill = count))+
  geom_bar(stat="identity")+xlab("Events")+
  ylab("Number of occurrences")+
  ggtitle("Most prominent Disaster in Washington")

Therefore, we can conclude that the most prominent disaster in Washington is Convection. Let us now see if these were the cause of the most number of fatalities in the case of washington. The summary of washington dataset gives us the following values.

summary(washington)
##     events              count      
##  Length:2           Min.   :11.00  
##  Class :character   1st Qu.:16.25  
##  Mode  :character   Median :21.50  
##                     Mean   :21.50  
##                     3rd Qu.:26.75  
##                     Max.   :32.00

Breaking our category variable into individual events, we get the following result

table(washington2$EVTYPE)
## 
## FLASH FLOOD       FLOOD   LIGHTNING     TORNADO   TSTM WIND 
##           7           4           6          22           4

Aggregating the dataset over years, we can plot more histograms depicting results of how the number of weather events have grown over the years. One of the primary reasons I wished to do this is to find out for a fact that has the number of events grown considerably over the years 1950-2011.

As you can see over the past few years there has been a considerable increase in the number of events. This provides isnights to the dact that due to global warming and other environmental degredation, there has been a rise in calamities over the years.

From the line graph above, we can see the trends more clearly as to how the events have grown considerably. In order to further look at how fatalities and injuries have risen, let’s take a look at a few more graphs.

So expecting so many events yearly, it would come off as a natural expectation that the number of fatalities should also increase yearly. Let’s plot the fatalities over the years.

The number of fatalities rise at the same pace the number of events have increased, however in recent times it has taken a dip urging the question as there been some kind of mitigation done by gvernement over the years. A closer look at the INJURIES gives us more insights about the results. If the same trend is followed, we can see that there has been considerable efforts in educating the masses about how to mitigate the destructive effects of the weather.

One of the other isights I wish to explore is the state wise event changes that has taken place over the years. For this, I aggregated the data over the states to plot a few more results.

Lets take a look at state wise event count. This will give us details regarding which state has been hit with most number of events over the span of 1950-2011. It is important to know which states have been effected the most as this will form a base so as to create more awareness among the masses with respect to the events that occur in these states.

Clearly, the state of Texas has seen a lot of events over the years. However, to get an intuition of which state has been hit with most fatal events over the years. Let’s take a look at state wise population damage. It would be right to expect that Texas would also be hit the most fatalities, however we cannot comment unless we look at the graph.

The state of illinois has been most severly hit by weather events culminating in many deaths. This could help provide data to the illinois government to support its residents during natural events that are to occur in the future. Also, in second place we have the state of Texas thus giving us insights that it has been hit with some of the worst events.

Property damage on the other hand has shows data consisent with further analysis at the end of the report. We find that kentucky is the worst hit in terms of property damage. The histogram depicting the crop damages that have occured over the years are depicted below:

Let’s look at a few scatter plots to identify - if there is any kind of relationship present between a few features.

We begin by comparing two features i.e fatalities and injuries. We get the following result:

The graph shows some kind of linear relationship between the two variables which would make sense: Let’s say a person is injured due to an unforseeable calamity like flood or lightning. If this event were to have a very serious effect on the person, he/she could succumb to the injury which could lead to a fatality.

Let’s look at the relationship between property damage and crop damage:

The x-axis depicts property damage, while the y axis shows corresponding crop damage. I have faceted accroding to categories of events taking place so as to get more perspective into relationship between events.

For the large part we find that Convection events cause a lot of property damage and crop damage. this is due to the the fact that such events include cyclone and tornadoes which have a disastrous effect on both. Hence we see a smooth linear regression curve in the first grid.

Events like Extreme Temperatures have a large effect on crops rather than property as shown in the second rid. Thus, we get a perspective as to what we would be expecting in the case of economical and health damage that these weather events cause from the plot above.

Multivariate Plots

This section will cover more detailed plots and multiple scatter plots togther to find some kind of relationships between variables quickly.

Figure above shows the fatalities and in which Latitude and Longitude they are concentrated. Most of the data shows a slight amount of fatalities due to the overplotting of blue dots. In order to get a better perspective we will use the maps package to plot the fatalities on Map of USA.

Figure above shows multiple deaths in the case of Convection Events and Flood Events.

Figure above shows the relationship between various damages and fatalities. This is to find some kind of relationship between the economic and health related damages the weather events cause damage to. Insights: - Fatalities and crop damage have no relation whatsoever as expected. - Injuries and Crop damage have once agin no relationship whatsover. - Injuries and Fatalaties seems to have an almost linear relationship with property damage. - property and crop damage see, to follow a direct relationship at some places. On Reviewing all the insights from the graphs and exploratory data analysis, following statements can be made. - Washington suffers from a large number of conviction events. - Convection events cause the most damage in terms of health. - Flooding events have the most effect on property and crops.

Final Plots and Results

Severe weather events that cause the largest number of incidents to population?

Calculate a total of all fatalities and injuries, so that we can find what events have the highest number of incidents. The new column is called incidents.

smallData$incidents = smallData$FATALITIES + smallData$INJURIES

We create a new set with the aggregate of the incidents grouped by the event types.

incidentData <- aggregate(list(incidents = smallData$incidents),
                          by = list(event = smallData$category), 
    FUN = sum, na.rm = TRUE)

Here is the overview of the event categories with the number of incidents

incidentData$event <- reorder(incidentData$event, -incidentData$incidents)
ggplot(incidentData, aes(y = incidents)) + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  geom_bar(aes(x = event), data = incidentData, stat = "identity") + 
    ggtitle("Fatalities and Injuries") +
  xlab("Event Category") + ylab("No. of Fatalities and Injuries")

Clearly the convection events (Lightning, Tornadoes, Thunderstorm Wind, Hail) has the greatest effect on injury and fatality.

Severe weather events that cause the largest amount of damage to economy?

Add a field with a total of all damage, both to property as well as to crop, so that we can find our what events cause the highest amount of damage. The new column is called damage. The column is in billions of dollars.

smallData$damage = ((smallData$propertydamage + smallData$cropdamage)/1e+09)

We create a new set with the aggregate of the damage grouped by the event types.

damageData <- aggregate(list(damage = smallData$damage),
                        by = list(event = smallData$category), 
    FUN = sum, na.rm = TRUE)

Here is the overview of the event categories with the amount of damage

damageData$event <- reorder(damageData$event, -damageData$damage)
ggplot(damageData, aes(y = damage)) +
  theme(axis.text.x = element_text(angle = 90, 
    hjust = 1)) +
  geom_bar(aes(x = event), data = damageData, stat = "identity") + 
    ggtitle("Property and crop damage") + xlab("Event Category") +
  ylab("Amount of damage (billions of $)")

Clearly the flooding events (Flash Flood, River Flood) have the greatest effect on property damage and crop damage.

After Converting the data into an aggregate form and replacing state abbreviations with their name, I built a new csv containing the aggregate data and removed the states that were not present in the maps package

Figure above shows thet total damage to health i.e lives and as we can see the most affected states are the western states due to the fact that they are victims of events like tornadoes and hurricanes. Let us know look at the damage to Economy which includes property and crop damages

As we can see the western and central states of USA have been hit hard causing huge losses to economy fro the years 1950-2011.

Which natural disasters are the most costly(In depth analysis)?

Tropical storms/hurricanes are the most dangerous events and they cause the most destruction to property. These events come under Convection events and hence as seen in the trends above, we find that convection events on an average occur more than other categories of events. Thus, we can say that relef measures fo these kinds of events are a must to mitigate the damages done.

Which natural disasters cause the most damage to crops(in depth analysis)?

This question makes us think of the most obvious answer which would be either floods or droughts as they cause the most damages to vegetation on an average. Even from the graph our intuition is proved as drought and flood are he major causes of damages to crops.

Reflections

  • There were many challenges in this project particularly due to the fact that there was so much cleaning up to do in order to get the data ready to be analysed. I believe, I may have ballparked a few data, however My initial EDA conforms to the final conclusion which I have given.
  • A particular point about this dataset is that it contains many events which do not contribute to much damages which lead to a large redundancy of data.
  • As far as the imporvement in analysis is concerned, I believe a few more correlations between the features at stake would go onto provide more insights into the data set.
  • My Analysis has been that in terms of a broad sense. A few more intricate details can be covered within this data set leading to better results and conclusion from the data.
  • Features are less due to the fact that a large number of values are noth fathomable and have averaged ot value.
  • The most challenging aspect of ths data set was the amount of cleaning to be done, event to just obtain the chloropleths.